A 100-Gigbit Highway for Science
Climate researchers are producing some of the fastest growing datasets in science. Five years ago, the amount of information generated for the Nobel Prize-winning United Nations International Panel on Climate Change (IPCC) Fourth Assessment Report was 35 terabytes—equivalent to the amount of text in 35 million books, occupying a bookshelf 248 miles (399 km) long. By 2014, when the next IPCC report is published, experts predict that 2 petabytes of data will have been generated for it—that’s a 580 percent increase in data production.
Because thousands of researchers around the world contribute to the generation and analysis of this data, a reliable, high-speed network is needed to transport the torrent of information. Fortunately, the Department of Energy’s (DOE) ESnet (Energy Sciences Network) has laid the foundation for such a network—not just for climate research, but for all data-intensive science.
“There is a data revolution occurring in science,” says Greg Bell, acting director of ESnet, which is managed by Lawrence Berkeley National Laboratory. “Over the last decade, the amount of scientific data transferred over our network has increased at a rate of about 72 percent per year, and we see that trend potentially accelerating.”
In an effort to spur U.S. scientific competitiveness, as well as accelerate development and widespread deployment of 100-gigabit technology, the Advanced Networking Initiative (ANI) was created with $62 million in funding from the American Recovery and Reinvestment Act (ARRA) and implemented by ESnet. ANI was established to build a 100 Gbps national prototype network and a wide-area network testbed.
To cost-effectively deploy ANI, ESnet partnered with Internet2—a consortium that provides high-performance network connections to universities across America—which also received a stimulus grant from the Department of Commerce’s Broadband Technologies Opportunities Program.
Researchers Take a “Test Drive” on ANI
So far more than 25 groups have taken advantage of ESnet’s wide-area testbed, which is open to researchers from government agencies and private industry to test new, potentially disruptive technologies without interfering with production science network traffic. The testbed currently connects three unclassified DOE supercomputing facilities: the National Energy Research Scientific Computing Center (NERSC) in Oakland, Calif., the Argonne Leadership Computing Facility (ALCF) in Argonne, Ill., and the Oak Ridge Leadership Computing Facility (OLCF) in Oak Ridge, Tenn.
“No other networking organization has a 100-gigabit network testbed that is available to researchers in this way,” says Brian Tierney, who heads ESnet’s Advanced Networking Technologies Group. “Our 100G testbed has been about 80 percent booked since it became available in January, which just goes to show that there are a lot of researchers hungry for a resource like this.”
Climate 100
To ensure that researchers will use future 100-gigabit effectively, another ARRA-funded project called Climate 100 brought together middleware and network engineers to develop tools and techniques for moving unprecedentedly massive amounts of climate data.
“Increasing network bandwidth is an important step toward tackling ever-growing scientific datasets, but it is not sufficient by itself; next-generation high-bandwidth networks need to be evaluated carefully from the applications perspective as well,” says Mehmet Balman of Berkeley Lab’s Scientific Data Management group, a member of the Climate 100 collaboration.
According to Balman, climate simulation data consists of a mix of relatively small and large files with irregular file size distribution in each dataset. This requires advanced middleware tools to move data efficiently on long-distance high-bandwidth networks.
“The ANI testbed essentially allowed us to ‘test drive’ on a 100-gigabit network to determine what kind of middleware tools we needed to build to transport climate data,” says Balman. “Once the development was done, we used the testbed to optimize and tune.”
At the 2011 Supercomputing Conference in Seattle, Wash., the Climate 100 team used their tool and the ANI testbed to transport 35 terabytes of climate data from NERSC’s data storage to compute nodes at ALCF and OLCF.
“It took us approximately 30 minutes to move 35 terabytes of climate data over a wide-area 100 Gbps network. This is a great accomplishment,” says Balman. “On a 10 Gbps network, it would have taken five hours to move this much data across the country.”
Approximately 13.7 billion years ago, the Universe was almost homogenous — meaning that every location in the cosmos was similar. Today, this is no longer the case. This simulation starts from a nearly homogeneous Universe and shows how the it has changed over billions of years. Performed on 4,096 cores of NERSC’s ‘Hopper’ system with the Nyx code, this movie was generated with over 5 terabytes of data and was transferred to the SC11 Conference exhibit floor in Portland, Ore., last November, over ESnet. The video on the left shows the simulation streaming on a 10 Gbps link, while the one on the right shows the same model streaming on a 100 Gbps link. These simulations were generated by Prabhat (LBNL).